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Abstract 

We extend probability estimates on the smallest singular value of 
random matrices with independent entries to a class of sparse random 
matrices. We show that one can relax a previously used condition of 
uniform boundedness of the variances from below. This allows us to con- 
sider matrices with null entries or, more generally, with entries having 
small variances. Our results do not assume identical distribution of the 
entries of a random matrix and help to clarify the role of the variances 
of the entries. We also show that it is enough to require boundedness 
from above of the r-th moment, r > 2, of the corresponding entries. 
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1 Introduction and main results 

Let N > n be positive integers. In this paper we study the smallest singu- 
lar value of N x n matrices T = whose entries are real- valued random 
variables obeying certain probability laws, and furthermore we are interested 
in allowing these matrices to contain some null entries (or, more generally, to 
contain entries with small variances). Thus we deal with sparse (or dilute) 
random matrices. Sparse random matrices and sparse structures play an im- 
portant role, as they arise naturally in many branches of pure and applied 
mathematics. We refer to Chapter 7 of j5] for definitions, relevant discussions, 
and references (see also the recent works [14l l3T] ). 

Understanding the properties of random matrices, in particular the behav- 
ior of their singular values (see the definitions in Section [2]), is of importance 
in several fields, including Asymptotic Geometric Analysis, Approximation 
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Theory, Probability and Statistics. The study of extreme singular values in 
classical random matrix theory concentrates on their limiting behavior as the 
dimension grows to infinity. Such limiting behavior is now well understood for 
various kinds of random matrices whose entries are independent in aggregate, 
or independent up to the symmetry constraints (e.g. hermitian or unitary 
matrices), in many cases even with identical distribution being required. We 
refer to the following books, surveys, and recent papers for history, results, 
and open problems in this direction [11 [51 [91 HU [121 1221 EU [33]. 

In the non-limiting asymptotic case very little was known till very recently. 
In such a case one studies the rate of convergence, deviation inequalities, and 
the general asymptotic behavior of singular values of a matrix as functions of 
the dimensions, assuming that the dimensions are large enough (growing to 
infinity). The Gaussian case, i.e. the case when the entries of the matrix are 
independent jV(0, 1) Gaussian, was treated independently in [8J and [29] (see 
also [13] for related results, and the survey [7]). In the last decade the attention 
shifted to other models, like matrices with independent subgaussian entries 
(in particular, symmetric Bernoulli ±1 entries), independent entries satisfying 
some moment conditions as well as matrices with independent columns or 
rows satisfying some natural restrictions. Major achievements were obtained 

in [21 El HS1 1251 ESI EH EOl E2] - 

In all previous non-limiting asymptotic results for random matrices with 
independent entries, an important assumption was that the variances of all the 
entries are bounded below by one, i.e. in a sense, that all entries are buffered 
away from zero and thus cannot be too small. Such a condition is not natural 
for some applications, for instance when one deals with models in the theory 
of wireless communications, where signals may be lost (or some small noise 
may appear), or with models in neural network theory, where the neurons are 
not of full connectivity with each other, making sparse random matrices more 
suited in modelling such partially connected systems. 

The main goal of our paper is to show that one can significantly relax the 
condition of boundedness from below of all entries, replacing it by averaging 
type conditions. Thus our paper clarifies the role of the variances in the corre- 
sponding previous results (cf. e.g. [HI [251 ESI EZj). Another advantage of our 
results is that we require only boundedness (from above) of the r-th moments 
for an arbitrary (fixed) r > 2. We would like to emphasize that we don't 
require identical distributions of all entries of a random matrix nor bounded- 
ness of the subgaussian moment of entries (both conditions were crucial for 
deep results of [27]). Moreover, the condition on entries "to be identically dis- 
tributed" is clearly inconsistent with our model, as, under such a condition, if 
one entry is zero then automatically all entries are zeros. 

We describe now our setting and results. Our main results present estimates 
for the smallest singular value s n (T) of large matrices T of the type described. 
It turns out the methods used to establish those estimates depend on the aspect 
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ratio of the matrices. The aspect ratio of an N x n matrix A is the ratio n/N 
of number of columns to number of rows, or, more intuitively, the ratio "width 
by height" . To have a suggestive terminology, we will say that such matrix A 
is 

• ll tall" if ^ < Co for a small positive constant Cq, 

• " almost square" if is close to 1. 

Clearly, a matrix is square when its aspect ratio is equal to 1. 

Since we will deal with random matrices under various conditions, for the 
sake of exposition clarity we list now all our conditions. For parameters r > 2, 
\i > 1, ai > 0, a,2 > 0, a 3 G (0,/i), and a 4 G (0, 1], we will consider N x n 
random matrices T = (£,ji)j<N,i<n whose entries are independent real- valued 
centered random variables satisfying the following conditions: 

(i) Moments: E \£,ji\ r < fi r for all j and i. 

(ii) Norm: p(||r|| > aiV^v) < e~ a2iV . 

(iii) Columns: E|| (^i)jLilll = E^Li E$ > a 2 3 N for each z. 

For almost square and square matrices we also will need the following condition 
on rows. 

(iv) Rows: \{i : E£j^ > 1}| > a 4 n for each j. 

Notice that these conditions allow our matrices to contain many null (or small) 
entries, in the sense that we don't impose any restrictions on the variance of 
a particular random variable. Naturally, in order for our random matrices to 
have entries of different kinds, we do not require that the entries are identi- 
cally distributed. Our model is different from the sparse matrix models used 
e.g. in [131 EI], where zeros appeared randomly, i.e. starting from a random 
matrix whose entries have variances bounded away from 0, each entry was 
multiplied by another random variable of type 0/1. Our model is more similar 
to those considered in [9], where a condition similar to (iii) was used for square 
symmetric matrices. 

It is important to highlight that the parameters //, r, ai,a 2 ,a 3 ,a4 should 
be regarded as constants which do not depend on the dimensions n, N. Note 
also that the ratio fi/a^ is of particular importance (/i is responsible for the 
maximal L r -norm of entries, while a 3 is an average-type substitution for the 
lower bound on L 2 -norm of entries) . 

Before stating our main results let us comment our conditions in more 
detail. The first condition is a standard requirement saying that the random 
variables are not "too big". For the limiting case it is known that one needs 
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boundedness of the forth moments. It turns out that for our estimates it is 
enough to ask boundedness of moments of order r = 2 + e only, which improves 
all previous results. In particular, this was one of the questions raised in [34] . 
where the author proved corresponding estimates for entries with bounded 4+e 
moment, and asked about 2 + e moment. 

The second condition is crucial for many results on random matrices. We 
recall that the norm of an N x n matrix is understood to be the operator 
norm from £% to , also called the spectral norm, which is equal to the largest 
singular value. In fact, the question "What are the models of random matri- 
ces satisfying condition (ii)?" (and more generally, "What is the behavior of 
the largest singular value?") is one of the central questions in random ma- 
trix theory. Such estimates are well known for the Gaussian and subgaussian 
cases. We refer to [31 EE] and references therein for other models and recent 
developments on this problem. 

We would like to emphasize that condition (ii) is needed in order to get 
probabilities exponentially close to one. Alternatively, one may substitute this 
condition by 



in which case one should add p^ to the estimates of probabilities in our theo- 
rems below. 

The main novelty in our model are conditions (iii) and (iv). These two 
conditions substitute the standard condition 



which was used in all previous works related to the smallest singular value of a 
random matrix (in the non-limiting case). Removing such strong assumption 
on all entries, we allow the possibility of zeros to appear among the entries 
of a random matrix. Our conditions (iii) and (iv) should be compared with 
the normalization conditions (1.1) and (1.16) in [9]. Our methods are similar 
to those used in [18j |27] , however we deal with a rather different model, and 
correspondingly our proofs require much more delicate computations. In par- 
ticular, the proof of key Proposition 14.11 which estimates the probability that 
for a fixed vector x the Euclidean norm ||ra;||2 is small, is much more involved 
(cf. the proof of [HJ Proposition 3.4] or [261 Corollary 2.7]). 

Of course we want to rule out matrices having a column or a row consisting 
of zeros only, for if there is a zero column then immediately s n (T) = 0, while 
if there is a zero row then the matrix T is essentially of size (N — 1) x n. 
Hence we need some general assumptions on the columns and the rows of 
the matrices under consideration. Our condition (iii) alone implies that each 
column vector of the matrix has relatively big £ 2 -norm. Moreover, condition 
(iii) together with condition (i) guarantee that proportionally many rows have 
£ 2 -norms bounded away from 0. It turns out that condition (iii) is already 




E|^i| 2 > 1 for all j,i, 



(1) 
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enough for "tall" matrices, when iV > Cn, as the first theorem below shows. 
The cases of "almost square" and square matrices are more delicate, because N 
becomes closer to n, and we need to control the behavior of rows more carefully. 
Condition (iv) ensures that each row of the matrix has proportionally many 
entries with variance at least one. 

Now we state our results. The first theorem deals with "tall" matrices and 
extends the corresponding result from [IS] (for uniformly bounded above mean 
zero random variables with bounded below variances this was shown in [6]). 
Note that we use only three conditions, (i), (ii), and (iii), while condition (iv) 
is not required for this result. 

Theorem 1.1. Let r > 2, \i > 1, a±, 02, a 3 > with a 3 < \i. Let 1 < n < N be 

integers, and write N in the form N — (1 + S)n. Suppose T is an N xn matrix 
whose entries are independent centered random variables such that conditions 
(i), (ii) and (iii) are satisfied. There exist positive constants c\, c 2 and 5 
(depending only on the parameters r, fi, ai, 02, a%) such that whenever S > So, 
then 



Remark. Our proof gives that C\ = Ci(r, /i,a 3 ), c 2 = C2(r, /i, a 2 , a 3 ) and 
So = S {r,fi,a 1 ,a 3 ). 

Our next theorem is about "almost square" matrices. This theorem extends 
[T8| Theorem 3.1]. Here both conditions (iii) and (iv) are needed in order to 
substitute condition (CQ). 

Theorem 1.2. Let r > 2, ji > 1, a 1; a 2 > 0, a 3 e (0,/i) ; a 4 G (0,1]. Let 
1 < n < N be integers, and write N in the form N = (1 + S)n. Suppose T 
is an N x n matrix whose entries are independent centered random variables 
such that conditions (i), (ii), (iii) and (iv) are satisfied. There exist positive 
constants c\, c 2 , C\ and c 2 , depending only on the parameters r , n, a\, a 2; a 3; 
a 4 , and a positive constant 7 = j(r, fi, a±, a 3 ) < 1, such that if 



Remarks. 1. Our proof gives that c\ = ci(r, /1, a±, a 3 , 5), c 2 = c 2 (r, /i, a 2 , a 3 ), 
61 = ci(r, /i, ai, a 3 ) and c 2 = c 2 (r, /i, m, a 3 , a 4 ). 

2. Note that for small n, say for n < 2/c 2 , Theorem 11.21 is trivial for every 
5 > 0, either by adjusting the constant c 2 (for small N) or by using Theo- 
rem 11.11 (for large N) . 




a 4 > 1 — 7 



and 



5 > 



ln(2 + ~c 2 n) 



then 
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Let us note that in a sense our Theorems 11.11 and 11.21 are incomparable 
with the corresponding result of [27]. First, we don't restrict our results only 
to the subgaussian case. The requirement of boundedness of the subgaussian 
moment is much stronger, implying in particular boundedness of moments of 
all orders, which naturally yields stronger estimates. Second, another con- 
dition essentially used in [27] is "entries are identically distributed." As was 
mentioned above, such a condition is inconsistent with our model, since having 
one zero we immediately get the zero matrix. 

Our third theorem shows that we can also extend to our setting the corre- 
sponding results from [25], where the i.i.d. case was treated, and from [HI2], 
which dealt with the case of independent log-concave columns. Note again that 
we work under the assumption of bounded r-th moment (for a fixed r > 2). 
In fact in [26] two theorems about square matrices were proved. The first 
one is for random matrices whose entries have bounded fourth moment. Our 
Theorem 11.31 extends this result with much better probability. The second 
main result of [26] requires the boundedness of subgaussian moments as well 
as identical distributions of entries in each column, and, thus, is incomparable 
with Theorem 11.31 

Theorem 1.3. Let r > 2, fi > 1, ai,a 2 ,a 3 ,a 4 > with a 3 < ji. Suppose T is 
an nx n matrix whose entries are independent centered random variables such 
that conditions (i), (ii), (Hi) and (iv) are satisfied. Then there exists a positive 
constant 70 = 70 (r, /i, aj, 03) < 1 such that if 04 > 1 — 70 then for every e > 

W(s n {T) Ken- 1 ' 2 ) <C(e + n l - r ' 2 ), 

where C depends on the parameters r, /1, a\, 02, 03, 04. 

Finally we would like to mention that all results can be extended to the 
complex case in a standard way. 

Acknowledgment. The authors would like to thank N. Tomczak-Jaegermann 
for many useful conversations. We also thank S. Spektor for showing us ref- 
erence [23] and S. O'Rourke for showing us reference [9]. The second named 
author thanks G. Schechtman for hosting him at the Weizmann Institute of 
Science in Spring 2008, during which time part of this work was done. 



2 Notation and preliminaries 

We start this section by agreeing on the notation that we will use throughout. 
For 1 < p < 00, we write \\x\\ p for the £ p -norm of x = (xi)i>i, i.e. the norm 
defined by 

\\x\\ p = (^~]\%i\ pS ) for p < 00 and IMloo = sup|xj|. 
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Then, as usual, t n v = (R n , \\-\\ p ). The unit ball of Q is denoted B%. Also, S^ 1 
denotes the unit sphere of £ 2 , and e±, . . . ,e n is the canonical basis of £ 2 . 

We write (■, ■) for the standard inner product on M. n . By \x\ we denote 
the standard Euclidean norm (i.e. £ 2 -noTm) of the vector x = (xi)i>\. On the 
other hand, when A is a set, by | A | we denote the cardinality of A. 

The support of a vector x = (xj)j>i, meaning the set of indices correspond- 
ing to nonzero coordinates of x, is denoted by supp(x). 

Given a subspace E of M n we denote by Pe the orthogonal projection onto 
E. If E = W is the coordinate subspace corresponding to a set of coordinates 
a C {1, . . . , n}, we will write P a as a shorthand for P^. 

Let AfcDct" and £ > 0. Recall that J\f is called an e-net of D (in the 
Euclidean metric) if 

DC \J{v + eB2). 

In case D is the unit sphere S^ 1 or the unit ball E> 2 , a well known volumetric 
argument (see for instance [231 Lemma 2.6]) establishes that for each e > 
there is an e-net Af of D with cardinality |jV| < (1 + 2/e) n . 

2.1 Singular values. 

Suppose r is an ^ x n matrix with real entries. The singular values of T, 
denoted Sfe(r), are the eigenvalues of the n x n matrix V, arranged in the 
decreasing order. It is immediate that the singular values are all non-negative, 
and further the number of nonzero singular values of T equals the rank of T. 

The largest singular value si(T) and the smallest singular value s n (T) are 
particularly important. They may be equivalently given by the expressions 

si(r) = ||T : t 2 -> t^\\ = sup{|rx| : \x\ = 1}, s n (T) = M{\Tx\ : |x| = l}. 

In particular for every vector x6K" one has 

s n {T)\x\ < \Tx\ < Sl {T)\x\. (2) 

Note that the estimate on the left-hand side becomes trivial if s n (T) = 0. 
On the other hand, when s n (T) > the matrix T is a bijection on its image, 
and can be regarded as an embedding from l 2 into £ 2 , with (j2j) providing an 
estimate for the distortion of the norms under T. 

To estimate the smallest singular number, we will be using the following 
equivalence, which clearly holds for every matrix T and every A > 0: 

Sn (T) < A 3xe S™- 1 : |rx| < A. (3) 
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2.2 Subgaussian random variables. 

All random quantities appearing in this work are defined on the same under- 
lying probability space (Q,A,F). We will present estimates for the smallest 
singular value of matrices whose entries are independent random variables 
satisfying certain assumptions. Our results are valid for a large class of matri- 
ces which includes, in particular, those whose entries are subgaussian random 
variables. 

A (real-valued) random variable X is called subgaussian when there exists 
a positive constant b such that for every t e R 

Ee tx < e & 2 *72_ 

When this condition is satisfied with a particular value of b > 0, we also say 
that X is b- subgaussian, or subgaussian with parameter b. The minimal b in 
this capacity is called the subgaussian moment of X. 

It is an easy consequence of this definition that if X is b- subgaussian, then 
E(A) = and Var(A) < b 2 . Thus all subgaussian random variables are 
centered. The next proposition presents well-known equivalent conditions for 
a centered random variable to be subgaussian. 

Proposition 2.1. For a centered random variable X, the following statements 
are equivalent: 

(1) 3b > 0, Vt G R, Ee tx < e hH2 l 2 

(2) 3b > 0, VA > 0, F{\X\ > A) < 2e- x2 ' b2 

(3) 3b > 0, Vp > 1, {¥.\X\P) l l p < b^/p~ 

(4) 3c > 0, Ee cX2 < +oo 

Two important examples of subgaussian random variables are the centered 
Gaussian themselves and the symmetric Bernoulli ±1 random variables. In 
general, any centered and bounded random variable is subgaussian. 

We point out that, as consequence of the subgaussian tail estimate, the 
norm of a matrix whose entries are independent subgaussian random variables 
is of the order of vN with high probability. Namely, the following proposition 
holds (see e.g. [T8| Fact 2.4], where this was shown for symmetric random 
variables, the case of centered is essentially the same). 

Proposition 2.2. Let N > n > 1 be positive integers. Suppose F is an 
N x n matrix whose entries are independent subgaussian random variables 
with subgaussian parameters bounded above uniformly by b. Then there are 
positive constants c,C ( depending only on b) such that for every t > C 

P(||r|| > ty/N) < e~ ct2N . 
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2.3 Compressible and incompressible vectors. 



As equivalence (j3J) suggests, to estimate the smallest singular value of T we 
estimate the norm \Tx\ for vectors x G S n ~ l . More precisely, we will estimate 
\Yx\ individually for vectors in an appropriately chosen e-net and, as usual, 
we use the union bound. In the case of "tall" matrices just one single £-net 
is enough for this approximation method to work; but in the case of "almost 
square" matrices, as well as for square matrices, we will need to split the 
sphere into two parts according to whether the vector x is compressible or 
incompressible, in the sense that we now define. 

Let m < n and p 6 (0,1). A vector x G M. n is called 

• m-sparse if |supp(x)| < m, that is, if x has at most m nonzero entries. 

• (m, p)- compressible if it is within Euclidean distance p from the set of 
all m-sparse vectors. 

• (m, p) -incompressible if it is not (m, p)-compressible. 

The sets of sparse, compressible, and incompressible vectors will be denoted, 
respectively, Sparse(m), Comp(m, p), and Incomp(m, p). The idea to split the 
Euclidean sphere into two parts goes back to Kashin's work [T5] on orthogonal 
decomposition of if 1 , where the splitting was defined using the ratio of £2 
and l\ norms. This idea was recently used by Schechtman (|28j) in the same 
context. The splitting the sphere essentially as described above appeared in 
[T8| [19] and was later used in many works (e.g. in J26J 127])- 

It is clear from these definitions that, for a vector x, the following holds: 
x G Comp(m, p) -<=>- 3a C {1, . . . ,n} with \a c \ < m such that \P a x\ < p 
x G Incomp(m, p) •<=>■ Vct C {1, . . . ,n} with |<r c | < m one has \P a x\ > p. 



2.4 Two more results. 

Here we formulate two results, which will be used in the next section. The first 
one is a quantitative version of the Central Limit Theorem (CLT), called Berry- 
Esseen inequality. The second one is a general form of the Paley-Zygmund 
inequality (see e.g. [TBI Lemma 3.5]). 

Theorem 2.3 (Berry-Esseen CLT). Let 2 < r < 3. Let d,...,(n be in- 
dependent centered random variables with finite r-th moments and set a 2 : = 
ELi E Kfe| 2 - Then for all t G R 



(4) 



v k=l 7 k=l 

where g ~ Af(0, 1) and C is an absolute constant. 
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Remarks. 

1. The standard form of Berry-Esseen inequality requires finite 3-rd moment 
(i.e., it is usually stated for r = 3), see e.g. [101 p. 544] or [211 P- 300]. The 
form used here is from [21] (see Theorem 5.7 there). 

2. If r > 3, then clearly we have boundedness of 3-rd moment for free, and 
in this case we use the standard form of Berry-Esseen inequality (i.e., with 
r = 3). 

Lemma 2.4 (Paley-Zygmund inequality). Let p G (1, oo), q = p/(p — 1). Let 
f > be a random variable with E/ 2p < oo. Then for every < A < a/E f 2 
we have 

(E f 2 - A 2 ) 9 

P(/>A) - (Eppy/p • 
3 Small ball probabilities for random sums 

In this section we gather auxiliary results related to random sums, their small 
ball probabilities, etc., which are needed later. In fact, we adjust corresponding 
results from [T8] and [2E] to our setting. These results are also of independent 
interest. We provide proofs for the sake of completeness. 

The following lemma provides a lower bound on the small ball probability 
of a random sum. Its proof follows the steps of [HI Lemma 3.6] with the 
appropriate modification to deal with centered random variables (rather than 
symmetric), to remove the assumption that the variances are bounded from 
below uniformly, and to replace the condition of finite 3-rd moments by finite 
r-th moments (r > 2). 

Lemma 3.1. Let 2 < r < 3 and \i > 1. Suppose £ 1; . . . , £ n are independent 
centered random variables such that E|£j| r < \f for every i = l,...,n. Let 
x = (xi) G £2 be such that \x\ = 1. Then for every A > 



1=1 



> A > 



Proof. Define / = I Y^=i £i x i I ■ Let Si,...,s n be independent symmetric 
Bernoulli ±1 random variables, which are also independent of £1 . . . , £ n . Us- 
ing the symmetrization inequality [I7J Lemma 6.3], and applying Khinchine's 
inequality, we obtain 



E/ r < 2 r E^ 8^ J T = T E 5 E £ J Zi&i \ " < E 5 ( ) 

'=1 i>l ^i>l ' 



r/2 



Now consider the set 



S := < s — (sj) G i\ : > for every i and Sj = 1 > 
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We define a function (p : <S — >• R by 

This function is clearly convex, so that 

supy?(s) = sup<£(ej) = sup E^{^) r ' 2 < //. 

s6<S i>l i>l 

Thus E / r < 2 3r / 2 /i r - On the other hand, using the independence of £1, . . . , £ n , 

i>l 

Lemma [2.41 with p = r/2, q = r/(r — 2) implies the desired estimate. ■ 

The next proposition, which is a consequence of Theorem I2.3[ allows us 
to estimate the small ball probability. The proof goes along the same lines as 
the proof of [TU Proposition 3.2] (see also [20J Proposition 3.4]), with slight 
modifications to remove the assumption about variances. Recall that for a 
subset cr c {1, 2, ... , n}, P a denotes the coordinate projection onto W . 

Proposition 3.2. Let 2 < r < 3 and \i > 1. Let (£i)" =1 he independent 
centered random variables with E|£j| r < fi r for all i = 1,2, ... ,n. There is a 
universal constant c > such that 

(a) For every a < b and every x = (xi) G W 1 satisfying A := a/E Y^=i £i x i > ® 
one has 




(b) For every t > 0, every x = (x^ G M n and every a C {1,2, ... ,n} 
satisfying A a := a/E J2iea & x i > one ^ as 

suppfif:^-^ <*)<^V +c (^*^V- 

The next corollary gives an estimate on the small ball probability in the 
spirit of |2S1 Corollary 2.10]. 

Corollary 3.3. Let 2 < r < 3 and \l > 1. Lei £l, fre independent 

centered random variables with E|£j| r < // /or ever?/ z = l,...,n. Suppose 
x = (x^ G M. n and a C {1, . . . , n} are such that A < \xi\ < B and E£ 2 > 1 
for all i G cr. T/ien /or allt >0 

S p (|E^.-H <f )-H^(i + " r (f) r )' 

where C > is an absolute constant. 
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Proof. By assumptions on coordinates of x we have 

Al:=Ej2tM>W\A 2 

and 



\p aX \\ r r = J2\ x *\ r ^ \ a \ B ' r - 



Then, by part (b) of Proposition [X 



(\ n 
supP lY^Xi^i 

vem \\ i=1 



B r \ 


a 






a 


|r/2 



< 



C ft r /B\ 



\cr\ 



We need the following lemma proved in [26l Lemma 3.4]. 

Lemma 3.4. Let 7,p G (0, 1), and let x G Incomp( , yn, p) . Then there exists a 
set a — <j x C {1, . . . , n} of cardinality \a\ > \p 2 ^n and such that for all k G a 

<w\< ! 



2n y/^n 

The next lemma is a version of Lemma 3.7], modified in order to remove 
the assumption "variances > 1". 

Lemma 3.5. Let 2 < r < 3 and p > 1. Let £i, . . . , be independent centered 
random variables with E|£j| r < pi for every i. Suppose a := {i : E £f > 
1} has cardinality \a\ > a^n. Let 7, p G (0,1), and consider a vector x G 
Incomp^n, p) . Assuming that a$ + 4p 2 7 > 1 we have for every t > 



sup PI Vx^i 



< t 1 < c(tn 2 + //n 



2 



where c is a positive constant which depends on 7, p, a 4 , and r. 

Proof. Let a x be the set of spread coefficients of x from Lemma [3.41 so that 
\o~x\ > \p 2 l n - Set a := cx fl cr x . Then 

,_, 1 1 2 

l°1 = \ a \ + ~~ 1°" U cr^l > a 4 n + -p 711-11=: c^n. 

By the construction, for every i G a we have 

P , , 1 

< x, \ < 



'2n a/7^ 
12 



Applying Corollary 13.31 we obtain 

S p(gx A -«|<t)< ] ^(2^ + ^(^) r ) 

C ( Vint r (V2_Y\ 
- (c ny/^{ p +/i { Py /j) J 

3 r 2 r 

< c(tn~ + ffn~). 



4 "Tall" matrices (proof of Theorem 11.11) 



In this section we prove Theorem 11.14 which establishes an estimate on the 
smallest singular value for "tall" random matrices, meaning matrices whose 
aspect ratio n/N is bounded above by a small positive constant (independent 
of n and N). It is important to notice that Theorem 11.11 uses only conditions 
(i), (ii), and (iii), i.e. no condition on the rows is required here. 

The proof depends upon an estimate on the norm \Tx\ for a fixed vector x, 
which is provided by the following proposition. 

Proposition 4.1. Let 1 < n < N be positive integers. Suppose T is a matrix of 
size N x n whose entries are independent centered random variables satisfying 
conditions (i), (ii) and (iii) for some 2<r<3 ;/ u>l and ai, 02, 03 > with 
03 < fi. Then for every x G S 1 ™ -1 we have 



f(\Tx\ < biVTtj < e 

where b%, 62 > depend only on \i, 03 and r. 
Remark. Our proof gives that 



-b 2 N 



We postpone the proof of this technical result to the last section, so that 
we may keep the flow of our exposition uninterrupted. 

Proof of Theorem ll.li Passing to r = min{3, r} we may assume without 
loss of generality that r < 3. 

Let t > and fio : = : ||r|| < a±\/N}. By (|3J) it is enough to estimate 
the probability of the event 

E := {ijj : 3x G S 11 ' 1 s.t. \Tx\ < ty/N}. 

To this end we use the inclusion E C (E PI fi ) U and the union bound. 
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To estimate F(E n Qq), let < e < 1, and let AT be an e-net of S* 1 ' 1 
with cardinality |jV| < (3/e) n . For any x G S"™ -1 we can find y e Af such 
that \x — y\ < e. If further x satisfies \Tx\ < ty/N, then the corresponding y 
satisfies 

|Ity| < |racj + ||r|| • \y - x\ < ty/N + e^VN = (t + eaJy/N. (5) 

Taking e = min{l, t/ai}, we see that for each x G S"^ 1 satisfying \Tx\ < t\/N 
there is a corresponding y G Af such that \x — y\ < e and \Ty\ < 2ty/~N. Hence, 
using the union bound, setting t = b\/2 and using Proposition 14.11 one has 

p(£nfi ) < J2 F (\ T y\ < 2t ^) < \^\e- b2N < (-)V b2Ar , 

y eAf £ 
where b\ and 62 are as in Proposition 14.11 Thus 

F{Enn ) <exp(-^-) 

as long as 

Bearing in mind that N = (1 + we can see that the last condition is 
satisfied if 

-Lin(£), -Us}. ( 6) 

To finish, we use P(£) < P(£ n fl ) + F (^o) with the estimate for P(£ n fi ) 
just obtained and the estimate P(0§) — e~ a2N coming from condition (ii). ■ 



5 "Almost square" matrices (proof of Theo- 
rem 11.21) 

In this section we prove Theorem 11.21 We will be using all conditions (i) 
through (iv). The two key ingredients for the proof of this theorem are Propo- 
sition 14.11 and Proposition 13.21 

Proof of Theorem 11.21 Passing to r = min{3, r} we may assume without 
loss of generality that r < 3. 
Consider the event 

E :={u : 3x G S^ 1 s.t. |rz| < t^N}. 

By equivalence (jSJ) we are to estimate F(E) with an appropriate value of t 
(which will be specified later). 
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We split the set E into two sets Ec and Ej denned as follows: 

E c = {tu : 3x E S"- 1 n Comp(m,p) s.t. |rx| < t^N}, 
Ej = {u : 3x G S" 1-1 n Incomp(m, p) s.t. < *x/iV}, 

where m < n and p G (0, 1) will be specified later. 

Define fio := : ||r|| < aivN}. We will estimate P(i?) using the union 
bound in the inclusion 

E C (E c H fto) U (£/ n fi ) U fig. (7) 

Our proof will require that * < 1 (which will be satisfied once we choose *, 
see ( )22|) below); and furthermore that * and p satisfy 

2*1 

- < P < t- 8 
ai 4 

Case 1: Probability of Ec D fio- We work on the set Comp(m, p), where m < 
n and p G (0, 1) will be specified later. 

Given a; G S 1 ™ -1 D Comp(m, p), choose y G Sparse(m) so that |y — x| < p. 
It is clear that we may choose such a y in B2 (and thus 1 — p < \y\ < 1). Note 
that on VLq we have ||r|| < ai\/~N. Thus if x satisfies \Tx\ < tyN then 



\Ty\ < \Tx\ + ||r|| -\y-x\< tVN + aipv 7 ^ = (* + a 1 p)v / iV : . 

Let TV be a p-net in the set D Sparseim). We may choose such a net 
with cardinality 

~~ \m y \pJ ~ \mJ V p / V pm / 

For ?/ G -B2 fl Sparse(m) chosen above, let v G J\f be such that |i> — y\ < p. 
We observe that, by (jEJ), 

1 

M > 1 2/ 1 ~ P > 1 - 2p > -, 
and, by another use of (jHJ), 

|Pu| < \Ty\ + \\T\\ -\v-y\< (* + a^VN + pax\/N 
= (*+ 2a 1 p)y/N < ^-VN < 5a lP VN\v\. 

Hence 

P(£ c nft ) < P(3w G A/" s.t. |r«| < Saipv^l^l) < ^ P(|r^| < 5axpVN\v\\ . 

v£j\f 

(9) 
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Using Proposition 14. 1[ we obtain 

F(\Tv\ < haxpVNlvlj < e~ b2N 



provided that 
We choose 



5aip < h. (10) 



p:=nuB<i,A-> (11) 



so that both ffTUj) and the right hand side of (jSJ) are true. Now, from (Q, we 
have 

F(E C H tt ) < \N\e- b * N < ( ^\ m e - 

\ pm / 

Thus, if 



-b 2 N 



m 



.n(^) < ^ (12) 
V pm / 2 



then 

60 iV 

P(E c nfi ) < e~— . (13) 
Writing m = yn, we see that inequality (|T2l) is satisfied if 



f Vp 7 / - 2 



so we choose 



7 = (14) 

Case 2: Probability of Ej fl Oo- We work on the set Incomp(m, p), where p 
is defined in (TTIi) and m = with 7 chosen in (fl4"l) . 

For convenience we set a := t 1 '^ r ~ 2 ' /at. Since £ < 1 and in view of (jHJ), we 
observe that a < p/2. Recall also that that on we have ||r|| < awfN . 

Let M be an a-net of S" 1 " 1 with cardinality |A/"| < (3/a) n . Let a; G S^ 1 n 
Incomp(m, p) be such that < t-y/iV. Recall that by (HI) one has \P a x\ > | 
for every a C {1, . . . , n} with |<r c | < m. Then there is v G A/" such that \Tv\ < 
2t\fN and with the additional property \P a v\ > | for each cr C {l,...,n} 
with |ct c | < m. Indeed, choosing v e Af such that \x — v \ < a and using 
a\a = t 1 /^ -2 ) < t (which holds by the choice of a), we have 

|ru| < \Tx\ + ||r|| • \v - x\ < ty/N + OiVNa < 2t\fN 

and 

\P<tV\ > \PaX\ - \P<r(v ~ x)\ > p - a > -, 

where we used the condition 2a < 2t/a\ < p, required in (JSj). 
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Denote by A the set of all v G Af with the property that for each set 
o C {1, . . . , n} with \a c \ < m we have \P a v\ > |. Then 



P(£j H fi ) < P^3f G ^4 : |ru| < 2tv / iv) . 
Now, for each fixed v = (vt) G A we have 

p(m 2 < 4t 2 iv) = p(V - ^M 2 > o) 

<Eexp{iV-i^|rt;| 2 } 

I N n 2 



(15) 



j=i i=i 

n 



N 1 H i2 

j=l i=l 



(16) 



and our goal is to make this last expression small. To estimate the expectations 
we use the distribution formula: 



E 



exp {"i^|i^H } = / p ( exp {-^|s^H }>^ ds 

ue- u2/2 ¥\^J2^i v i\ < V2tujdu. (17) 



It is now apparent that we need to estimate the quantities 

7 < N. 



fiW-=v(\its» v *\ < x )> i 



To this end, note that for each row j G {1, . . . , N} there exists aj C {1, . . . , n} 
with cardinality \<jj\ > a^n such that E£ 2 j > 1 for all i G <Jj (this is condition 
(iv)). Also, for each fixed v, set 

a v := {i : |^| > a}. 

Since v G S^ 1 we have |<7„| < 1/a 2 . 
Set Wj = <7j \ <r v , and note that 

1 

a,- > -. 

a 2 

It follows that \aj\ < (1 — a 4 )n + ^, so to have < m it suffices to require 



(1 - a A )n + — < m. 



(18) 
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Note that f lT8|) . in particular, implies 1/a 2 < a^n < n. Recall that m = 771, 
where 7 was chosen in ( 1T3|) . Then inequality f lT8|) is satisfied if a 4 > 1 — 7 
(which is the condition on 7 in our Theorem) and 



t > 



a 1 



r-2 



V(7 + a 4 - l)n, 
Now, since |o^| < m, we have |P ff t> | > p/2, and hence 



(19) 



(where we have used the property > 1 for i 6 c^). Consequently, using 
Proposition I3.2[ and keeping in mind \vi\ < a for i we get 



«A)<c(^ + ^||^||;)<c(^^ 



A p r . 



P P' 



j 11 00 



p*M ) <c - + 



A p r a 



r „r— 2 



P P' 



for some absolute constant c > 1. Then, continuing from ( fl7|) we have 

r 1 1 n 2> i T 00 
Eexpj-— pCi^ } = y ue-^^f^tujdu 



< c 



Jo v p ^ ; 



cV2t 

P Jo 
cy^t c/i r t 



u 2 e- u2 / 2 du + \ 
P r Jo 



ue 



~ u2 ' 2 du 



+ 



P P^ 2 



C3*, 



where 



'7T U 

c 3 :=cl — + 



P p r ar 2 



(20) 



Therefore, from (Tl6|) . we get (for each fixed v G A) 

w(\Tv\ < 2tv / iv) < e N (c 3 tf = (c 3 et) N , 
and from this, in (TTSJ) we get 

¥(Ej n fio) < I^Kcset)^ < = (^f^ 

Then we can make 



A' 



-jV 



(21) 
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provided that 

1 / 1 x 1 /^ 

G 2) • ( 22 ) 

c 3 e z \6aiCse z / 

Choose t to satisfy equality in (I22l) . Note 

so the left hand side of (IE]) holds. Finally note that ( fl9l) is satisfied whenever 

^ ln(3aic 3 e 2 ) g x 



8 > 



1,, / (7+Q4-i)n \ ln(c 2 n) 

111 1 ^(cge 2 ) 2 /^" 2 ) 7 



To finish, we take probabilities in (j7|) and we use the estimates for ¥(Ec H Qo) 
and P(£// D fio) we have found in (Tl3|) and (I2ip . respectively, combined with 
the estimate P(^o) — e~ a2N coming from condition (ii). This shows that, with 
the chosen t, we have ¥(E) < ^r h ^ N l 2 + e~ N + e~ a2N , which completes the 
proof. ■ 



6 Square matrices (proof of Theorem 11.31) 



In this section our goal is to prove Theorem 11.31 We are going to use two 
lemmas from [26]. The first one is [261 Lemma 3.5]. Note that the proof given 
there works for any random matrix. 

Lemma 6.1. LetT be any random matrix of sizemxn. LetXi, . . . ,X n denote 
the columns of F and let Hk denote the span of all column vectors except the 
k-th. Then for every 7, p G (0, 1) and every e > one has 

1 n 

Pfinflrd <epn- 1/2 ) < — V P(dist(X fe , H k ) < e), 

where F = fl Incomp('jn, p). 

The next lemma is similar to [26, Lemma 3.8]. To prove it one would 
repeat the proof of that lemma, replacing [261 Lemma 3.7] used there with our 
Lemma 13.51 

Lemma 6.2. Let r G (2,3] and V be a random matrix as in Theorem \1.3l 
Let Xi, . . . , X n denote its column vectors, and consider the subspace H n = 
spanpTi, . . . , -X" n _i). Then there exists a positive constant jo = 70 (r, p, ai, a 3 ) < 
1 such that if ' 04 > 1 — 70 then for every e > one has 



p(dist(X n ,# n ) < e and \\T\\ < aX /2 ) < c{en L ^ + p r n 
where c depends on r , p, a 1; a 3; and 04. 



2-r , 
2 
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Now we are ready for the proof of Theorem 11.31 

Proof of Theorem 11.31 Without loss of generality we assume e < ai/2 
(otherwise choose C = 2/ai and we are done). We also assume that r < 3 
(otherwise we pass to ro = min{3,r}). 
Consider the event 

E := {uj : 3xe S^ 1 s.t. \Tx\ < tn~ 1/2 }. 

By equivalence ([3]) we are to estimate F(E) with an appropriate value of t 
(which will be specified later). 

As in the proof of Theorem 11.21 we split the set E into the sets Ec and Ej 
defined as follows: 

E c = {u : 3x G S^ 1 n Comp(m,p) s.t. \Vx\ < tn~ 1/2 }, 
Ej = {uj : 3x G 5 n_1 fl Incomp(m, p) s.t. < trT 1 ^ 2 }. 

Define f2o := { w : ||r|| < ai^/n}. We will estimate P(i?) using the union 
bound in the inclusion 

E C (E c H fi ) UEjU fig. (24) 

Case i: Probability of Ec H Qq- The proof of this case is almost line to line 
repetition of the corresponding proof in Theorem 11.21 (see Case 1 there). Let 
m < n and p G (0, 1) be specified later. Using approximation argument and 
the union bound as in the proof of Case 1 in Theorem 11.21 an d choosing 

p:=min[i,^}, 7 := m = 7 n, (25) 

we obtain 

P(£ c n n ) < e" fe2n/2 , (26) 

provided that 

- < P- (27) 
di 

Case 2: Probability of Ej. We work on the set Incomp(m, p), where m = 7 n 
and 7 , p chosen in ( 1251) . 

Using Lemma 16.11 with e = t/p, and also applying Lemma 16.21 we get 

1 n 

P(£j) < — VP(dist(A%tf fc ) < t/p) 
7 n ^— ^ v ' 

1 n 

< — ^jp(dist(X fc ,# fc ) <t/p & ||r|| < 0l Vn) +P(|]r|| > aix/n)} 

^ k=i 
1 n 

< — ^{c(en^ + /T?) + e" a2n } 
^ fc=i 

< -(en^r +n^-) + -e- a2n . (28) 

7 7 
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Also notice that our choice t = ep and our assumption e < a\/2 guarantee 
that t satisfies fl27j) . 

To finish the proof, we take probabilities in f l24|) . and we use the estimates 
for F(Ecr\fl ) and for F(Ej) obtained in fT2oT) and fT251) . respectively, combined 
with the estimate P(^q) < e~ a2?1 coming from condition (ii). This way we 
obtain 

¥(E) < e~ b2n/2 + -(en^ + n^) + -e~ a2Tl + e" a2n < C(en^r + n ^r) 

7 7 

for a suitable constant C. ■ 



7 Proof of Proposition 14.1 



Take an arbitrary x = (x±, . . . , x n ) G M n with \x\ = 1. For a > (a parameter 
whose value will be specified later), define a set of "good" rows as follows: 

J=J(a) = tje{l,...,N} : Ej2^i>a). 

^ »=l J 

Suppose that the cardinality of set J is | J\ = aN for some a G [0, 1]. Note 
that for each index j = 1, . . . , N we have 

n 

E^£ 2 * 2 < max Eg < max(E^) 2 / r < /i 2 . 



Ki<n J Ki<n 
8=1 



Then on one hand we have 

N 



e( e e^?) =e(*e$w) +e( e e^?) 

j=i V i=i / jeJ ^ «=i ' jeJ c ^ i=l ' 
< /i 2 aiV + a(l -a)N, 

while on the other hand, using condition (iii), 

N s n \ n s N \ n 

E ( E E = E ( E E 4 K ^ E w = a 2 iv. 
j=i ^ j=i / i=i ^ j=i / i=i 

Hence we have fi 2 aN + a(l — a)iV > a 2 iV, so a satisfies 

a > (29) 
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Note that for each j = 1, . . . , N, the j-th entry of Tx is (Tx)j = Yl7=i £i iXi 
Define fj := | Ya=i Cji^i \ , so 

TV 

Clearly /i, . . . , /jv are independent. For any t, r > we have 

/ iv x , TV 

P(|rx| 2 < t 2 iV) = P(^/ 2 <t 2 iV) =p(riV-^^/ 2 > 

rN ~^J2f? )= eTN U Ee ^(- 1 7f) (3,,) 

j=i ' i=i 



f 2 



From Lemma 13.11 we know that for every j = 1, . . . , N, 



gJXi&c? - A 2 ] + y /( - 2) 

Note that for every j £ J one has 



F(/j>A)> ( L ^ =1 ^; " JT ) =:&(r), (31) 



For arbitrary t > 0, 77 > and A > 0, set r := For each j = 1, . . . , N we 
have 

Eexpf--^) = / pfexp(-^f) > s]ds 



t 2 ) L V "V A 2 



vf 



< -)ds 



P ( eXP (f) < i) dS + /T( eXP (AV 
<e- v + F(fj < \){l-e- v ). 

Choosing r\ = In 2 and applying (I3TT) . we obtain 

Eexp(-^£) < c -» + (1 - /3,(r))(l - c -») = 1 - < exp (-^) ■ 

Since r < 4^, inequality (130]) implies 

TV 

P(|rx| 2 < t 2 iV) < e^n^^ 2 ^ e ( * 2/A2)Ar n e ^' (r)/2 - ( 33 ) 

i=i je,/ 
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Taking a = a\/2 and A = 03/2 and using (1321) we observe that for every 

/ a 2 \f/(r — 2) /!7T7T1\ 

j G J we have /3j > (32^2) • Also note this choice of a and (1291 imply 
a > fl3/(2/i 2 ). Now let 

2 a i f a l W (r ~ 2) 
* := 2VV2VJ 

Then continuing from ( |33i) we obtain 

This completes the proof. ■ 
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